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Abstract 



Modelising the translation errors by suitable mathematical operators in the 
crystal basis model of the genetic code and requiring that codons prone to be 
misread encode the same amino- acid, the main features of the organisation in 
multiplets of the genetic code are described. 



DSF-TH-34/2001 



I. INTRODUCTION 



The storage of genetic information is governed by the DNA, which is formed by four different 
nucleotides, characterized by their bases: adenine (A) and guanine (G) deriving from purine, 
and cytosine (C) and thymine (T) coming from pyrimidine. In the double helix structure of 
DNA there is always a pairing C-G and T-A. The transmission of information from DNA to 
build proteins is a complex process of transcription and translation The flow of informations 
from DNA is transmitted through the RNA which also contains 4 bases, T being replaced by 
uracile (U). The mRNA {messenger) transmits the information from DNA to the tRNA, which 
takes part to the protein synthesis. The transcription from DNA to mRNA takes place following 
the bases complementarity rule: A ^ U,T ^ A,G ^ C,C ^ G . In this context the idea of 
genetic code emerges giving the connection law translating a sequence of nucleotides in the RNA 
(here and in the following we shall refer to messenger) into a sequence of amino-acids (a. a.). 
A triple of nucleotides, codon, is read in the translation process. There are 64 codons, which 
encode for the 20 a. a. (denoted in the following by their standard shortened notation), which 
are the building blocks for any protein and for the signal of the end of the protein synthesis 
(Stop or Non-sense codons). It follows that there is not an one to one correspondence between 
codons and a. a. and the code is degenerate exhibiting a complex pattern of organisation in 
multiplets, ranging from sextets to singlets, in particular: for the vertebrate mitochondrial code 
(VMC) 2 sextets, 7 quartets and 12 doublets; for the eukaryotic or standard universal, code 
(sue) 3 sextets, 5 quartets, 2 triplets, 9 doublets and 2 singlets (seeTable(|lD). For a recent 
review with a rich bibliography to the original papers see |@]. Codons encoding the same a.a. 
are called synonymous. Since the discovery of the genetic code three, between many others, 
puzzling questions have arisen: why does nature use 20 a.a. to build up proteins ? why has 
the genetic code the peculiar organisation in multiplets ? why is an a.a. encoded by fewer 
codons than another ? An answer to the last question might be that a.a. more frequently used 
are encoded by greater multiplets. However such an explanation is weakly supported by the 
analysis of the data, see Table | see 0. In particular the sextet encoding for Arg seems really 
overabundant. 

To explain the pattern of the genetic code at least six hypothesis have been put forward Q]: 

1. the frozen accident theory, according to which the genetic code is the result of a random 
event ^ 

2. the stereochemical theory, which first idea dates back to 1954, only one year after the 
discovery of the DNA by Crick and Watson, and was suggested by Gamow 0, according 
to which the codons assignments are the results of an affinity between the a.a. and the 
encoding codons 

3. the coevolution theory, according to which a.a. most closely related are encoded by close 
codons, i.e. codons differing by the change of one base 

4. the lethal mutation theory according to which the genetic code has evolved minimizing 



the effect of point mutations of the codons on proteins ||T0| , ||TT 



5. the translation- error minimization theory according to which the evolution of the genetic 
code has been governed by the minimization of the errors in the translation process 



6. the genetic flexibility theory according to which the genetic code is the outcome of a 
balance between robustness and mutability fl^ 



However at my knowledege no quantitative model has been proposed to account either for 
the number of a. a. either for the structure in multiplets. Efforts have been concentrated to 
analyse the correspondence codons-a.a.. On this subject a very large literature exists; it is now 
generally accepted that this correspondence is not causal, but reflects the molecular structure 
of a. a., even if there is a great debate over the nature of the dominant factors. However in 
almost all the papers on the subject the number of a. a. and the structure in multiplets of the 
genetic code are assumed as input elements. The aim of this talk is to propose a mathematical 
model able to reasonably explain the organisation of the genetic code, i.e. the number and 
the dimension of multiplets. In a way the proposed model provides a mathematical frame for 
hypothesis 4) and 5). Clearly indeed a protection against the translation errors is obtained if 
the codons more prone to be misread encode the same a. a.. The first requirement to set such 
a mathematical model is to identify codons as mathematical objects, this will be done in the 
framework of the crystal basis model of the genetic code []TB[, which will be briefly recalled in the 



Sec. 3. To make the paper self-contained in Sec. 2 we recall the crystal basis for f/(g^o)(s/(2)). 
In Sec. 4 the consequences of the model will be given without details, which can be found in 



II. REMINDER OF U(^q^o){SL{2)) 



For self consistence, let me recall the definition and main properties of Uq^Q{sl{2)) and of 

Jl^. lAq{sl{2)) is defined by the following commutation relations 



the g-tensor operator, see, e.g.. 



[J+, J_] = [2J3], 



[h,J±] = ±J± (2) 

where 

In the following we shall omit the lower label q 

The deformed enveloping algebra Uq{sl{2)) is endowed with an Hopf structure. In particular 
the coproduct is defined by 

A(^3) = J3 ® 1 + 1 ® ^3 

A(^±) = ^± ® q-^' + q~^'' ® J± (4) 
The Casimir operator can be written 

C = J+J^ + [Js][J3 - 1] = J-J+ + [J3][J3 + 1] (5) 

For q generic, i.e. not a root of unity, the irreducible representations (irrep.) are lalelled by an 
integer or half- integer number j and the action of the generators on the vector basis \jm >, 
{—j < '"^ < j) , of the IR is 

J3 \jm >= m \jm > (6) 



J± Ijm >= J[j =F m] [j ± m + 1] \ j, m ± 1 >= F^{j, m) |j, m ± 1 > 



(7) 



From eqs.(||)-(|^) it follows 

C\jm>=[j][j + l]\jm> (8) 

Let us recall the definiton of g-tensor operator for Uq{sl{2)). An irreducible g-tensor of rank 
j is a family of 2j + 1 operators {—j'^ni<j) which tranform under the action of the 
generators of Uq{sl{2)) as 

q'^{Tl,)^q-^^Tiq"^-^ = q^Ti (9) 



or 

[J,,Tl]=mTl (10) 

^±(T4) = J± Ti q'^ - q-'^^' J± = m) T^^^, (11) 

In deriving the above equations use has been made of the non trivial coproduct eq.(|^). The 
q-Wigner-Eckart (g-WE) theorem now reads: 

< JM\Ti\nmr >= {-if^ <J\\T'\\ji> ^ j^^^j^\jM > (12) 

or 

TL >= i-lf' < Jim^jm\JM > \JM > (13) 

Let us study now the limit q — i>.From the definition eq.(|^) we have 

[x],^o ~ q-^^' X + (14) 

So it follows that 

F±(j,m),^o ~ (15) 

b1[j + l].-.o - q-^'^' (16) 

From eqs.(l^) and ([TsD it follows that the action of the generator J± is not defined in the limit 
g — >• 0. Let us define 

Xb = roJ± (17) 

where 

To |jm >= ([j][j + l])-^/2 |jm >,_o~ g^"^/' |jm > (19) 

These operators are well behaved for g ^ 0. Their action in the limit g ^ will define the 
crystal basis: 



J+ \ jm > = I j, m + 1 > for — j < m < j (20) 
J_ \ jm > = |j, m — 1 > for — j < m < j (21) 

j+ \n >= L \j, -J >= (22) 

It is also possible to define a Casimir operator in the crystal basis 

c = {Js? + ^ E E(^-)""'(^+)"(^-)' • (23) 

^ nGZ+ fc=0 

such that 

C\jm>=j{j + l)\jm> (24) 

Then I can define [|18| (g ^ 0)-tensor or crystal operator by : 

M-rL) =mTi J± (r^) = T^^^ (25) 

Clearly, if |m| > j then r^^ has to be considered vanishing. In the following I shall omit to 
explicitly write the tilde. The tensor product of two representations in the crystal basis is given 
by [jl9[. Theorem - If Bi and B2 are the crystal bases of the Mi and M2 Wq_^o('S^(2))-niodules, 
for u G Bi and w e ^2, we have: 

z , X I J^u ® V 3n > 1 such that J"m 7^ and J?f = 
J_(u (g) V) = < ~ ~ . (26) 

y u® J^v otherwise 

z , . \ u® J+v 3 n > 1 such that J?f 7^ and J"m = 
J+(M(g)t;) = < 7 , ~ . (27) 

y J+M ® f otherwise 

So the tensor product of two crystal basis is a crystal basis and the states of the basis of the 
tensor space are pure states. In other words in the limit g — >■ all the g-Clebsch-Gordan (g-CG) 
coefficients vanish except one which is equal to ±1. The Wigner-Eckart theorem eq.(|l^) now 
reads 

r^ljimi >=< J||r^||ji > |J,mi+m> (28) 

where the value of J (|ji — j| < J < ji + j) depends on the value of mi and m and of the order 
in which the tensor product of irreps. (jm) and (jimi). In the following the irrep. (jm) will 
be considered as the second one. 



III. THE MATHEMATICAL MODEL 

In the crystal basis model of the genetic code JlBl the 4 nucleotides are assigned to the 
4-dim fundamental irrep. (1/2,1/2) of f/q_>o(s/(2) © sl{2)) with the following assignment for 
the values of the third component of J for the two s/(2) which in the following will be denoted 
as sluiX) and s/y(2) : 

C.(+i,+i) T/U.(-l.+i) G.(+i.-i) A. (-1,-1) (29) 

and the codons, triple of nucleotides, to the 3-fold tensor product of (1/2,1/2). We report 
in Table(|I|) the assignment of the codons to the different irreps. The mathematical model 
mimicking the translation errors is essentially based on the following Assumption: 



Two codons are prone to translation error if their corresponding states in the 
crystal basis model are connected by the action of a suitable crystal tensor opera- 
tors rjj^i^ Tvm' of Uq^Q{slH{'^) © ■sZv'(2)) in the sense of the Wigner-Eckart theorem. 

We assume, on phenomenogical grounds see [TU|, |2T|, that there is a hierarchy in the 



occurrence of translation errors and, in order of decreasing intensity, we consider: 

1. the transitions, in particular C ^ U oi G ^ A, concerning nucleotides in the 3rd position 

2. the transversions, in particular C —>■ G, U ^ A and G A, in the nucleotides in 3rd 
position. 

3. the transitions (resp. transversions) concerning nucleotides in 1st position 

4. the transitions (resp. transversions) concerning nucleotides in 2nd position 

5. the mutation induced by the transitions (resp. transversions) on the first two nucleotides 

Transitions (transversions) of the nucleotide in the middle position are far weaker than transi- 
tions (transversions) in other positions. 

The hierarchy in the translation errors mechanisms means that a multiplet formed in a level 
is frozen; in the subsequent levels, the merging of two whole multiplets in a larger structure is 
possible, if it is induced by the relevant tensor operator. If the transition is allowed only for 
some member of a multiplet, there is conflict between the choice of merging the multiplet in a 
larger one, so decreasing the variety of encoded a. a. but increasing the protection or preserving 
the multiplets decreasing the level of protection. In this case, the formation of larger structures 
will generally take place or not according to the rule to protect the weakest codons, i.e. the 
codons more inclined to be misread. 1 assume that misreading of nucleotide C or A is the most 
common. Let me emphasize that we want to build the most simple model in which the codons, 
which are most subject to reading errors, are synonymous; in this spirit the explicitly analysed 
transitions {G ^ U, G ^ A) or transversions {G ^ G, U ^ A, G ^ A) have not to be 
considered as the only possible misreadings, but as the representatives which allow the most 
simple modelisation. For simplicity 1 consider only the transversions decreasing or leaving 
unchanged the value of Jh,3- Finally, it is clear from the Table (|I|) that there are generally 
more than one irrep. labelled by the same value of {Jh, Jv) whose content in the constituent 
nucleotides is different. The transformation properties of a crystal tensor operator determine 
which states are related each other, only according to the irreps. to which the states belong 
to. To take into account someway the multiplicity of irreps. , generally, the choice of the rank 
of the tensor operator will depend on the position of the misread nucleotide and on the irrep. 
to which the codon belongs to. The transitions and the transversions are modelised by the 
following crystal tensor operator, the value of the component being determined by the labels 
of the nucleotides, see eq.(p9D: 



G^U or G^A r}j_^^T^Q (30) 
G^G or U^A r^,o®^y,-i (31) 
G^A r^._i ® (32) 



where the values of the rank a, 6, c and d depend on the position inside the codons of the 
misread nucleotide and on the irreps. to which the codons belong, see next section. The above 



choice for the horizontal (resp. vertical) part of the crystal vector operator in eq.f^OD (reps. 
eq.(PTD) is indeed the most simple choice according to the change in the labels of the states of 
codons for transitions (resp. transversions) , see eg . (|29|) . The choice of the rank of the vertical 
(resp. horizontal) part of the crystal operator in eg. (PD|) (resp. eg. (PI])), as well as the tensor 
operator modelising the tranvsersion C ^ A, is somewhat arbitrary. The value of the rank 
of the operator modelising the translational errors in 2nd position will be generally assumed 
larger than the one describing errors in 1st position and the latter one will be generally assumed 
larger than the one describing errors in 3rd position, so to model the less freguent misreading. 



IV. OUTCOME OF THE MODELISATION OF TRANSLATION ERRORS 

In the following I use the standard notation: X,Y,N denoting any nucleotide, Y = C, U 
(pyrimidine) , R = G, A (purine). 

1. Misreading of 3rd nucleotide 

The transitions in 3rd position in the codons XZC and XZG, are modelised by the operator 
given by eg. (|30D with a = 0: (in the following the eguations have to be read by the western 
rule from left to right) 

ij{XZC)o{T}j_^(^T%) =^ ij{XZU) (33) 
^{XZG)o{rh^_,^T%) =^ ^{XZA) (34) 

We get the splitting of the 64 codons in 32 doublets of the form XZR and XZY. 

The transversions in 3rd position in the codons XZC and XZU are modelised by the 
following operators: 

^(XZC) o (r^ ® r,^_,) =^ ij{XZG) (35) 
V'(XZ[/)o(r^7oi®4_,) =^ ij{XZA) (36) 
V'(XZC) o (r]^ ® r^_,) =^ ij{XZA) (37) 



where in egs. (p5|) ,(|36D,(|37D 6 = 2 if the first two nucleotides (dinucleotide) XZ are: CA, 
OA, CG, UG, UA, UU, AU, AA, GG, AG) t and = 1 otherwise. 

We get the merging of 16 doublets in 8 guartets, the guartets being the codons whose the 
first two nucleotides are: CC, CU, CG, UC, GG, GC, GU, and AC. 

2. Misreading of 1st nucleotide 

The transitions are modelised by the operators eg.(|30|) with a = 1: 



V'(CXAr) o (r^__, ® 4_o) =^ tPiUXN) 

^(GXiV) o (ri__, ® 4^o) =^ ij{AXN) (38) 

We get the merging of the doublet UUR and the guartet GUN in a sextet (encoding Leu). 
The transversions in first position are modelised by the operators: 

7/;(CXZ)o(r^o®4,-i) =^ ^{GXZ) (39) 

i,{UXZ) o (r^o ® =^ i^iAXZ) (40) 

^(CXZ) o (r^ _i 4_,) =^ i,{AXZ) (41) 



where c = 1 if the codons CXZ and UXZ belong to the same irrep. and c = 2 otherwise. 
As a consequence the doublet AGR merges into the quartet CGN forming another sextet 
(encoding Arg). 



Misreading of central nucleotide 



The transitions are modelised by the operators eq. (|30|) with a = 2: 




(42) 



No modification of the previous established pattern comes out 
We modelise the transversions as 



^(XCZ)o(4_o®rJ_i) =^ ij{XGZ) 
i,{XUZ)o{Tl^^®Tl_^) =^ i,{XAZ) 



(43) 
(44) 
(45) 



where c = 1 if the codons XCZ and XUZ belong to the same irrep. and c = 2 otherwise. 
It turns out that one should expect the fusion in a sextet of the quartet UCN and of the 
doublet UGY. This sextet does not appear, but as we shall see below the quartet UCN 
indeed merges with the doublet AGR. One should also expect the fusion in a sextet of the 
quartet CGN and the doublet GAY, which indeed does not happen. Both these results 
suggest that indeed the misreading of the central nucleotide is a very weak effect, if not 
enhanced by the simultaneous misreading of the first nucleotide. 

Misreading of two nucleotides 

The transition and transversion of the first (second) nucleotide is modelised by the same 
operator used for the translation or transversion on the first nucleotide. In the following 
we denote with a lower label the position of the nucleotide where the operator acts. The 
action of the two-nucleotides operators has to be computed in the following way: as first 
step one has to compute the action of the operator labelled by I giving rise to a "virtual" 
state with the labels assigned by the action of the relevant operator on the initial state of 
the codon, then one considers the action of the operator labelled by II on the "virtual" 
state and gets the labels of the final state. If these labels denote, see Table(|ID, the state 
corresponding to the codon, the misreading is allowed. As an example let us compute 




(46) 



(i) ij{CCN) o (rl^, ® r^,)j 
(ii) ^{UCN)^^„)o{t}j^_^®tI 




(47) 



We get the merging the doublet AGR with the quartet UGN giving rise to the third sextet 
(encoding Ser). 



V. CONCLUSIONS 



The outcome of the proposed mathematical model is a pattern of organisation in: 3 sextets, 
5 quartets, 13 doublets which is very close to the pattern of the VMC and SUC codes. In 
particular it differs from the last one for the absence of the breaking of two doublets into 



singlets, which may reasonably be seen as a minor effects. A more refined analysis, see [|15 
indeed gives hints for: 

• the breaking of the doublets 

• the choice of the Stop codons 

• the similarity of the physical chemical properties of a. a. encoded by codons prone to 
misreading 

The number and dimension of multiplets seem to be the outcome of a strategy addressed to 
keep as many as different amino acids with a reasonable protection against translation errors. 
As final remark, in |]T5[, a discussion of the dependence of the obtained results from the 



choice of tensor operators mimicking the misreading shows that there is a dependence, but the 
main bulk is left unmodified. 
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TABLES 



TABLE I. Relative frequency (R.f.) (lO"^) of occurrence of 20 amino-acids (a.a.) and the number 

(N) of corresponding encoding codons. 



3j. 3,. 


R.f. 


N 


a.a. 


R.f. 


N 


3j. 3j. 


R.f. 


N 




R.f. 


N 


Leu 


91 


6 


Glu 


62 


2 


Arg 


51 


6 


Tyr 


32 


4 


Ala 


77 


4 


Thr 


59 


4 


Pro 


51 


4 


Met 


24 


1 


Gly 


74 


4 


Lys 


59 


2 


Asn 


43 


2 


His 


23 


2 


Ser 


69 


6 


He 


53 


3 


Gin 


41 


2 


Cys 


20 


2 


Val 


66 


4 


Asp 


52 


2 


Phe 


40 


2 


Trp 


14 


1 



TABLE IL The vertebral mitochondrial code. The upper label denotes different irreducible rep- 
resentations. In bold character the amino acids which are encoded dfFerently in the eukaryotic or 
standard code: UGA, AUA, AGR encoding respectively for Ter, He and Arg. 



codon a. a. 


Jh Jv 


J3,H Jsy 


codon a. a. 


Jh Jv 


<^3,i? J^y 


CCC Pro 
ecu Pro 
CCC Pro 
CCA Pro 


3 3 

(I 3\1 
\2 2) 
('3 

V2 2) 

(1 1)1 

\2 2) 


3 3 

Z Z 

1 3 

2 2 

3 1 
2 2 

1 1 

2 2 


UCC Ser 
UCU Ser 
UCC Ser 
UCA Ser 


3 3 

Z Z 

^2 2/ 
(3 1)1 
V2 2) 

(I 1)1 
V2 2) 


1 3 

Z Z 

1 3 

2 2 

1 1 

2 2 

1 1 

2 2 


cue Leu 
CUU Leu 
CUG Leu 
CUA Leu 


(1 3\2 

(I 3\2 
^2 2' 
(I 1n3 
^2 2) 
(I 1\3 
^2 2/ 


1 3 
z z 

1 3 

2 2 

1 1 

2 2 

1 1 

2 2 


UUC Phe 
UUU Phe 
UUG Leu 
UUA Leu 


3 3 
z z 
3 3 
2 2 
(3 1)1 
V2 2J 
(3 1)1 
V2 2) 


1 3 

3 3 

2 2 

1 1 

2 2 

3 1 
2 2 


CGC Arg 
CGU Arg 
CGG Arg 
CGA Arg 


/3 l\2 

(1 1\2 
^2 2/ 

/3 1\2 
V2 2^ 
(1 1\2 
V2 2J 


3 1 

Z Z 

1 1 

2 2 

3 1 
2 2 

1 1 

2 2 


UGC Cys 
UGU Cys 
UGG Trp 
UGA Trp 


(i 

\ z Z ' 

(I 1)2 
V2 2) 
fS 1)2 
^2 2) 

(I 1)2 
V2 2) 


1 1 

Z Z 

1 1 

2 2 

1 1 

2 2 

1 1 

2 2 


CAC His 
CAU His 
GAG Gin 
CAA Gin 


(1 1\A 
\ z Z ^ 
l\ lU 
V2 2) 
(\ lU 
\2 21 
(\ 1\4 
\2 21 


1 1 
Z Z 

1 1 

2 2 

1 1 

2 2 

1 1 

2 2 


UAC Tyr 
UAU Tyr 
UAC Ter 
UAA Ter 


(i \? 

\ Z Z ' 

(3 1)2 
V2 2) 
(3 1)2 

V2 21 

(1 1)2 
V2 2) 


_ 1 1 
Z Z 

3 1 
2 2 

1 1 

2 2 

3 1 
2 2 


GCC Ala 
GCU Ala 
GCG Ala 
GCA Ala 


3 3 

Z Z 

(\ Z\\ 
\2 2) 
/3 \\\ 
\2 21 

fi 1)1 
\2 21 


3 1 

Z Z 

1 1 

2 2 

3 1 
2 2 

1 1 

2 2 


ACC Thr 
ACU Thr 
ACG Thr 
ACA Thr 


3 3 

Z Z 

fl 3)1 
V2 2) 
(3 1)1 
V2 21 

fi i)l 

\2 21 


1 1 

Z Z 

1 1 

2 2 

1 1 

2 2 

1 1 

2 2 


GUC Val 
CUU Val 
GUG Val 
CUA Val 


(\ Z\2 
^ Z Z ' 
{\ 3\2 
^2 2> 
(1 1\3 
V2 2J 
(1 1\3 
^2 2/ 


1 1 
Z Z 

1 1 

2 2 

1 1 

2 2 

1 1 

2 2 


AUG He 
AUU He 
AUG Met 
AUA Met 


3 3 

Z Z 

3 3 
2 2 
f3 i)l 
V2 21 
(3 i)l 
V2 2) 


_ 1 1 
Z Z 

3 1 
2 2 

1 1 

2 2 

3 1 
2 2 


CGC Gly 
GCU Gly 
CGG Gly 
GCA Gly 


3 3 

2 2 
(1 3\1 
\2 2> 

3 3 
2 2 

(1 3\1 
V2 2> 


3 1 
2 2 

1 1 

2 2 

3 3 
2 2 

1 3 

2 2 


AGC Ser 
AGU Scr 
AGG Ter 
ACA Ter 


3 3 

2 2 
fi 3)1 
^2 21 

3 3 
2 2 

fi 3)1 
V2 2) 


1 1 

2 2 

1 1 

2 2 

1 3 

2 2 

1 3 

2 2 


GAG Asp 
GAU Asp 
GAG Glu 
GAA Glu 


(1 3\2 
\2 2> 
(1 3\2 
\2 2) 
(1 i\2 
\2 2) 
(1 3n2 
\2 2' 


1 1 

2 2 

1 1 

2 2 

1 3 

2 2 

1 3 

2 2 


AAC Asn 
AAU Asn 
AAG Lys 
AAA Lys 


3 3 

2 2 

3 3 

2 2 

3 3 

2 2 

3 3 
2 2 


1 1 

2 2 

3 1 
2 2 

1 3 

2 2 

3 3 
2 2 



